Conversation
Greptile SummaryThis PR extracts hardcoded Gemini prompts from Key improvements:
Issues already flagged in previous comments:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Client Request] --> B[FastAPI Endpoint<br/>/api/v1/generate]
B --> C[build_study_generation_prompt]
C --> D{include_examples?}
D -->|Yes| E[Add SYSTEM_PROMPT +<br/>EXAMPLES + OUTPUT_FORMAT]
D -->|No| F[Add SYSTEM_PROMPT +<br/>OUTPUT_FORMAT only]
E --> G[Send to Gemini API]
F --> G
G --> H[Receive JSON Response]
H --> I[clean_response<br/>Remove markdown blocks]
I --> J[json.loads]
J --> K[validate_data<br/>Check required fields]
K --> L[validate_quiz_quality<br/>Check for duplicates/errors]
L --> M[Return GenerateResponse]
M --> N[Client receives study pack]
J -.->|JSONDecodeError| O[500 Error]
K -.->|ValueError| O
G -.->|None response| O
Last reviewed commit: 61b7a24 |
|
@greptileai review this again and give me the updated confidence score |
Greptile OverviewGreptile SummaryRefactors prompt logic from inline string in Key improvements:
Issues found:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| backend/prompts/study_gen_v1.py | New centralized prompt module with versioning, few-shot examples, and validation. Minor: missing init.py for proper package structure |
| backend/main.py | Integrates new prompt system. Issues: Python 3.9+ type syntax breaks 3.8 compatibility, commented code should be removed |
Sequence Diagram
sequenceDiagram
participant Client as Frontend
participant API as /api/v1/generate
participant Prompt as study_gen_v1.py
participant Gemini as GeminiService
participant Validator as validate_quiz_quality()
Client->>API: POST /api/v1/generate<br/>{text: "user notes"}
API->>API: Validate request.text<br/>(not empty)
API->>Prompt: build_study_generation_prompt()<br/>(user_notes, include_examples=True)
Prompt->>Prompt: Construct prompt with:<br/>SYSTEM_PROMPT + EXAMPLES +<br/>user_notes + OUTPUT_FORMAT
Prompt-->>API: Complete prompt string
API->>Gemini: call_gemini(prompt)
Gemini-->>API: JSON response<br/>(may have markdown wrapping)
API->>API: Clean markdown fences<br/>(remove ```json markers)
API->>API: Parse JSON and validate<br/>(summary, quiz fields)
API->>Validator: validate_quiz_quality(quiz_data)
Validator-->>API: warnings list<br/>(logged, non-blocking)
API-->>Client: GenerateResponse<br/>{summary: [], quiz: []}
Additional Comments (2)
Prompt To Fix With AIThis is a comment left during a code review.
Path: backend/main.py
Line: 54:55
Comment:
`list[str]` and `list[QuizQuestion]` use Python 3.9+ syntax. For Python 3.8 compatibility, import `List` from typing and use `List[str]` and `List[QuizQuestion]`
```suggestion
options: List[str]
answer: str
```
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix With AIThis is a comment left during a code review.
Path: backend/main.py
Line: 64:65
Comment:
Same Python 3.9+ syntax issue
```suggestion
summary: List[str]
quiz: List[QuizQuestion]
```
How can I resolve this? If you propose a fix, please make it concise. |
|
@greptileai review this again with the new version of main and give me the confidence score aswell |
Greptile OverviewGreptile SummaryThis PR centralizes the Gemini study-generation prompt into Main issues to address before merge are around integration/robustness: the new import path in Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| backend/README.md | Documents the new centralized study prompt module and how it’s used by /api/v1/generate. |
| backend/main.py | Switches generate endpoint to use the prompt builder and adds quiz quality warnings; introduces an import/run-context mismatch and logs raw AI output to stdout. |
| backend/prompts/study_gen_v1.py | Adds centralized prompt builders and quiz-quality validation; few-shot examples include non-JSON scaffolding that can induce non-JSON model outputs. |
Sequence Diagram
sequenceDiagram
autonumber
participant Client
participant API as FastAPI (/api/v1/generate)
participant Prompt as study_gen_v1.build_study_generation_prompt
participant Gemini as GeminiService.call_gemini
Client->>API: POST /api/v1/generate {text}
API->>Prompt: build_study_generation_prompt(user_notes, include_examples)
Prompt-->>API: prompt string
API->>Gemini: call_gemini(prompt)
Gemini-->>API: response.text (string)
API->>API: strip markdown fences + json.loads
API->>API: validate quiz items + validate_quiz_quality
API-->>Client: 200 {summary, quiz}
alt Gemini returns None
API-->>Client: 500 Failed to generate
else JSON decode / structure error
API-->>Client: 500 Failed to parse / Invalid format
end
| # Optional: Run quality checks on the quiz | ||
| quality_warnings = validate_quiz_quality(data.get("quiz", [])) | ||
| if quality_warnings: | ||
| print(f"[generate] Quality warnings: {quality_warnings}") | ||
| # Can log these or return them to the frontend in the future |
There was a problem hiding this comment.
Sensitive data logged to stdout
On invalid/failed responses, this endpoint prints Raw response: {response} and also prints quality_warnings unconditionally. Gemini output can contain user-provided notes verbatim, so this will leak user content into server logs. Since this PR adds additional logging paths, it should be gated/removed or switched to structured logging with redaction (and avoid printing raw model output).
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/main.py
Line: 176:180
Comment:
**Sensitive data logged to stdout**
On invalid/failed responses, this endpoint prints `Raw response: {response}` and also prints `quality_warnings` unconditionally. Gemini output can contain user-provided notes verbatim, so this will leak user content into server logs. Since this PR adds additional logging paths, it should be gated/removed or switched to structured logging with redaction (and avoid printing raw model output).
How can I resolve this? If you propose a fix, please make it concise.
backend/prompts/study_gen_v1.py
Outdated
| EXAMPLES = """Here are examples of correct output format: | ||
|
|
||
| --- EXAMPLE 1 --- | ||
|
|
||
| INPUT NOTES: | ||
| "Photosynthesis is the process by which plants convert sunlight into energy. It occurs in chloroplasts and requires carbon dioxide and water. The outputs are glucose and oxygen." | ||
|
|
||
| CORRECT OUTPUT: |
There was a problem hiding this comment.
Few-shot violates JSON-only rule
SYSTEM_PROMPT requires “ONLY valid JSON”, but EXAMPLES includes headers like --- EXAMPLE 1 ---, INPUT NOTES:, and CORRECT OUTPUT:. Because the final prompt contains these non-JSON tokens, the model may mirror that structure and prepend similar headers in its response, directly causing json.loads() to fail. If strict JSON output is required, keep the few-shot content itself purely JSON (or clearly isolate examples as non-output content and add an explicit anti-pattern warning).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/prompts/study_gen_v1.py
Line: 83:90
Comment:
**Few-shot violates JSON-only rule**
`SYSTEM_PROMPT` requires “ONLY valid JSON”, but `EXAMPLES` includes headers like `--- EXAMPLE 1 ---`, `INPUT NOTES:`, and `CORRECT OUTPUT:`. Because the final prompt contains these non-JSON tokens, the model may mirror that structure and prepend similar headers in its response, directly causing `json.loads()` to fail. If strict JSON output is required, keep the few-shot content itself purely JSON (or clearly isolate examples as non-output content and add an explicit anti-pattern warning).
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.10937ad to
55e4f5e
Compare
Greptile SummaryThis PR centralizes prompt engineering into Critical issues:
Additional notes:
Confidence Score: 0/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Client POST /api/v1/generate] --> B[main.py: generate_study_materials]
B --> C[build_study_generation_prompt]
C --> D[SYSTEM_PROMPT]
C --> E[EXAMPLES few-shot]
C --> F[OUTPUT_FORMAT]
C --> G[user_notes]
D --> H[Complete Prompt]
E --> H
F --> H
G --> H
H --> I[gemini_service.call_gemini]
I --> J[Gemini API Response]
J --> K[clean_response]
K --> L[json.loads]
L --> M[validate_data]
M --> N[validate_quiz_quality]
N --> O[Return GenerateResponse]
style C fill:#90EE90
style I fill:#FFB6C1
style L fill:#FFD700
style M fill:#FFD700
Last reviewed commit: 55e4f5e |
backend/README.md
Outdated
| <<<<<<< HEAD | ||
| Run study pack test (require Gemini API key): | ||
|
|
||
| ```bash | ||
| python tests/test_studypack.py | ||
| ``` | ||
| ======= | ||
|
|
||
| --- | ||
|
|
||
| ## Study Generation Prompt System (v1) | ||
|
|
||
| The backend now uses a centralized prompt module for AI study generation. | ||
|
|
||
| Location: | ||
|
|
||
| ``` | ||
| backend/prompts/study_gen_v1.py | ||
| ``` | ||
|
|
||
| This module defines how Gemini transforms raw notes into: | ||
|
|
||
| - bullet-point summaries | ||
| - multiple-choice quiz questions | ||
|
|
||
|
|
||
| ### How it connects to the API | ||
|
|
||
| The `/api/v1/generate` endpoint builds a prompt using: | ||
|
|
||
| ```python | ||
| prompt = build_study_generation_prompt(request.text) | ||
| ``` | ||
|
|
||
| Then sends it to Gemini: | ||
|
|
||
| ```python | ||
| response = await gemini_service.call_gemini(prompt) | ||
| ``` | ||
|
|
||
| The prompt module handles: | ||
|
|
||
| - system instructions | ||
| - output schema | ||
| - few-shot examples | ||
| - formatting rules | ||
| - quality validation | ||
|
|
||
|
|
||
| ### Prompt builder functions | ||
|
|
||
| #### Full study pack | ||
|
|
||
| ```python | ||
| build_study_generation_prompt(notes) | ||
| ``` | ||
|
|
||
| Returns summary + quiz. | ||
|
|
||
| #### Quiz only | ||
|
|
||
| ```python | ||
| build_custom_quiz_prompt(notes, num_questions=3) | ||
| ``` | ||
|
|
||
| Generates quiz-only output. | ||
|
|
||
| #### Summary only | ||
|
|
||
| ```python | ||
| build_summary_only_prompt(notes) | ||
| ``` | ||
|
|
||
| Generates summary-only output. | ||
|
|
||
|
|
||
| ### Versioning | ||
|
|
||
| Current version: | ||
|
|
||
| ``` | ||
| study_gen_v1.py → VERSION 1.0.0 | ||
| ``` | ||
|
|
||
| Future prompt improvements should create: | ||
|
|
||
| ``` | ||
| study_gen_v2.py | ||
| study_gen_v3.py | ||
| ``` | ||
|
|
||
| Never silently change v1 behavior — version prompts explicitly. | ||
|
|
||
|
|
||
| ### Quality checks | ||
|
|
||
| After Gemini responds, quiz output is validated using: | ||
|
|
||
| ```python | ||
| validate_quiz_quality() | ||
| ``` | ||
|
|
||
| This detects: | ||
|
|
||
| - duplicate options | ||
| - invalid answers | ||
| - weak question structure | ||
|
|
||
| Warnings are logged for debugging. | ||
|
|
||
|
|
||
| ### Editing prompts safely | ||
|
|
||
| If you change prompt behavior: | ||
|
|
||
| 1. Keep JSON schema identical | ||
| 2. Do not change API response format | ||
| 3. Test with messy notes input | ||
| 4. Verify frontend still parses correctly | ||
|
|
||
| Breaking schema can end up breaking frontend. | ||
|
|
||
| --- | ||
|
|
||
| >>>>>>> 39ed2b2 (Updated readme on study prompt generation) |
There was a problem hiding this comment.
unresolved merge conflict markers still present in file - build will fail
backend/prompts/study_gen_v1.py
Outdated
| EXAMPLES = """Here are examples of correct output format: | ||
|
|
||
| --- EXAMPLE 1 --- | ||
|
|
||
| INPUT NOTES: | ||
| "Photosynthesis is the process by which plants convert sunlight into energy. It occurs in chloroplasts and requires carbon dioxide and water. The outputs are glucose and oxygen." | ||
|
|
||
| CORRECT OUTPUT: | ||
| { | ||
| "summary": [ | ||
| "Photosynthesis converts sunlight into chemical energy (glucose) in plant cells", | ||
| "Occurs in chloroplasts and requires CO₂ and H₂O as inputs", | ||
| "Produces glucose for plant energy and oxygen as a byproduct" | ||
| ], | ||
| "quiz": [ | ||
| { | ||
| "question": "Where does photosynthesis take place in plant cells?", | ||
| "options": ["Mitochondria", "Chloroplasts", "Nucleus", "Cell wall"], | ||
| "answer": "Chloroplasts" | ||
| }, | ||
| { | ||
| "question": "Which of the following is a product of photosynthesis?", | ||
| "options": ["Carbon dioxide", "Water", "Oxygen", "Nitrogen"], | ||
| "answer": "Oxygen" | ||
| }, | ||
| { | ||
| "question": "What is the primary energy source for photosynthesis?", | ||
| "options": ["Heat", "Sunlight", "Chemical energy", "Wind"], | ||
| "answer": "Sunlight" | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| --- EXAMPLE 2 --- | ||
|
|
||
| INPUT NOTES: | ||
| "The water cycle includes evaporation, condensation, and precipitation. Water evaporates from oceans and lakes, forms clouds through condensation, and returns to Earth as rain or snow." | ||
|
|
||
| CORRECT OUTPUT: | ||
| { | ||
| "summary": [ | ||
| "The water cycle is a continuous process of water movement on Earth", | ||
| "Evaporation occurs when water from oceans and lakes becomes water vapor", | ||
| "Condensation forms clouds, and precipitation returns water to Earth's surface" | ||
| ], | ||
| "quiz": [ | ||
| { | ||
| "question": "What happens during evaporation in the water cycle?", | ||
| "options": ["Water falls as rain", "Water becomes vapor", "Clouds form", "Ice melts"], | ||
| "answer": "Water becomes vapor" | ||
| }, | ||
| { | ||
| "question": "Which process forms clouds in the water cycle?", | ||
| "options": ["Evaporation", "Precipitation", "Condensation", "Filtration"], | ||
| "answer": "Condensation" | ||
| }, | ||
| { | ||
| "question": "What are the main forms of precipitation?", | ||
| "options": ["Steam and vapor", "Rain and snow", "Clouds and fog", "Rivers and lakes"], | ||
| "answer": "Rain and snow" | ||
| } | ||
| ] | ||
| }""" |
There was a problem hiding this comment.
few-shot examples contain non-JSON headers (--- EXAMPLE 1 ---, INPUT NOTES:, CORRECT OUTPUT:), but line 48 requires "ONLY valid JSON" - model may mirror this structure and break parsing
backend/main.py
Outdated
| from services import GeminiService | ||
|
|
||
| # Import the new prompt system | ||
| from backend.prompts.study_gen_v1 import ( |
There was a problem hiding this comment.
missing __init__.py in prompts/ directory - Python won't recognize it as a package and import will fail
…-v1-2.3, fixes for main conflicts
| "Examples of valid JSON output (do not copy any labels; return ONLY JSON):", | ||
| EXAMPLES, |
There was a problem hiding this comment.
Instruction tells model not to copy labels, but the EXAMPLES constant (lines 83-145) includes non-JSON headers like EXAMPLE 1, INPUT NOTES:, and CORRECT OUTPUT:. The model may mirror this structure despite the warning. Consider restructuring examples to be pure JSON or make the warning more explicit
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
backend/prompts/study_gen_v1.py
Outdated
| "", | ||
| f"Process these notes and generate study materials:", | ||
| "", | ||
| "Return ONLY a JSON object. Do not include any headings, labels, or extra text.", |
There was a problem hiding this comment.
Instruction says "Return ONLY a JSON object" but immediately precedes the user notes. Consider placing this instruction AFTER the user notes as the last thing before generation, which may be more effective for output format compliance
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Additional Comments (2)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Feature (PromptV1): Centralized study generation prompt system SOC-2.3
🎉 New feature (Extends backend architecture, non-breaking feature)
PR Summary
Adds a centralized, versioned prompt system for AI study generation.
The Gemini prompt is moved out of
main.pyinto a dedicated module with strict JSON enforcement and few-shot examples to improve reliability without changing the API contract.Overview
What feature/problem does this PR address?
main.pyhard to maintainWhat approach was taken?
prompts/study_gen_v1.pyas the single source of truthbuild_study_generation_prompt()to construct promptsImportant design decisions / trade-offs
Files Changed
Test Cases / Edge Cases
Checklist
Additional Notes
study_gen_v2.pyinstead of modifying v1Jira Ticket
Jira Ticket(s) - [SOC-2.3]